Date: 13 November 2016
This report explores game rating from IGN. Summary and structure of dataset are as follows.
## X score_phrase
## Min. : 0 Great :4772
## 1st Qu.: 4657 Good :4741
## Median : 9312 Okay :2945
## Mean : 9312 Mediocre:1959
## 3rd Qu.:13968 Amazing :1804
## Max. :18624 Bad :1269
## (Other) :1134
## title
## Cars : 10
## Madden NFL 07 : 10
## Open Season : 10
## Brain Challenge : 9
## LEGO Star Wars II: The Original Trilogy: 9
## Madden NFL 08 : 9
## (Other) :18567
## url
## /games/aladdin/gba-566703 : 2
## /games/big-league-sports/wii-14275098 : 2
## /games/blur/xbox-360-14222096 : 2
## /games/call-of-duty-modern-warfare-2/ps3-2550: 2
## /games/crash-twinsanity/ps2-667247 : 2
## /games/defiance/pc-71832 : 2
## (Other) :18612
## platform score genre editors_choice
## PC :3370 Min. : 0.50 Action :3797 N:15107
## PlayStation 2:1686 1st Qu.: 6.00 Sports :1916 Y: 3517
## Xbox 360 :1630 Median : 7.30 Shooter :1610
## Wii :1366 Mean : 6.95 Racing :1228
## PlayStation 3:1356 3rd Qu.: 8.20 Adventure:1174
## Nintendo DS :1045 Max. :10.00 Strategy :1071
## (Other) :8171 (Other) :7828
## release_year release_month release_day
## Min. :1996 Min. : 1.000 Min. : 1.0
## 1st Qu.:2003 1st Qu.: 4.000 1st Qu.: 8.0
## Median :2007 Median : 8.000 Median :16.0
## Mean :2007 Mean : 7.139 Mean :15.6
## 3rd Qu.:2010 3rd Qu.:10.000 3rd Qu.:23.0
## Max. :2016 Max. :12.000 Max. :31.0
##
## 'data.frame': 18624 obs. of 11 variables:
## $ X : int 0 1 2 3 4 5 6 7 8 9 ...
## $ score_phrase : Factor w/ 11 levels "Amazing","Awful",..: 1 1 6 6 6 5 2 1 2 5 ...
## $ title : Factor w/ 12589 levels ".deTuned",".hack//G.U. Vol. 1: Rebirth",..: 5702 5703 9767 7249 7249 11405 2908 4446 2908 11405 ...
## $ url : Factor w/ 18577 levels "/games/0-d-beat-drop/xbox-360-14342395",..: 8390 8387 14319 10813 10812 16931 4271 6526 4270 16932 ...
## $ platform : Factor w/ 59 levels "Android","Arcade",..: 39 39 15 58 36 20 58 33 36 33 ...
## $ score : num 9 9 8.5 8.5 8.5 7 3 9 3 7 ...
## $ genre : Factor w/ 113 levels "","Action","Action, Adventure",..: 65 65 70 95 95 106 39 83 39 106 ...
## $ editors_choice: Factor w/ 2 levels "N","Y": 2 2 1 1 1 1 1 2 1 1 ...
## $ release_year : int 2012 2012 2012 2012 2012 2012 2012 2012 2012 2012 ...
## $ release_month : int 9 9 9 9 9 9 9 9 9 9 ...
## $ release_day : int 12 12 12 11 11 11 11 11 11 11 ...
There are 18625 entries in this dataset with 11 features (X, title, url, score_phrase, score, platform, genre, editors_choice, release_year, release_month, release_day). X is just the index while title and url are specific to each game. I will not include these three features in the analysis. Release_year, release_month and release_day can be combined into one single feature called release_date. There is one factor feature that I order it myself, namely score_phrase. The levels are as follows.
Disaster < Unbearable < Painful < Awful < Bad < Mediocre < Okay < Good < Great < Amazing < Masterpiece
I am interested in score, genre and platform. I would like to examine which platform should a gamer buy such that he/she can play a lot of high quality games.
Release_date will support the investigation in determining time development of game and platform. While editors_choice will help me filter high quality games.
Yes, I created release_date by combining release_year, release_month and release_day.
Most of the distributions are lightly skewed, so no transformation is required here.
Red dashed line is the average score.
Solid line is median and dashed line are first and ninth quantile.
## # A tibble: 12 × 2
## release_month score_median
## <int> <dbl>
## 1 1 7.0
## 2 2 7.5
## 3 3 7.4
## 4 4 7.3
## 5 5 7.1
## 6 6 7.2
## 7 7 7.1
## 8 8 7.5
## 9 9 7.6
## 10 10 7.5
## 11 11 7.3
## 12 12 7.0
The following table shows top-score genre with more than 100 games.
## # A tibble: 6 × 3
## genre genre_median_score number_of_game
## <fctr> <dbl> <int>
## 1 RPG 7.9 980
## 2 Action, Adventure 7.7 765
## 3 Action, RPG 7.7 330
## 4 Fighting 7.5 547
## 5 Platformer 7.5 823
## 6 Puzzle 7.5 776
While some platforms have a good amount of games in recent year (ex. iPhone) and some are more popular in old day, PC and PlayStation Series always have consistent number of games throughout the period of interest. Game score does not depend on day release but it tends to increase slightly with year. The best month to release a game is September. The worst are January and December.
Percentage of editors_choice’s game tends to increase with time. This is related to the uptrend in score with time. Number of gaming platform is increasing. Some of the top score games (score > 8) are not picked as the editors_choice, this suggest that the editors must have other criteria in picking their choice.
Score and score_pharse are perfectly aligned with each other. This is because score_pharse is a categorical version of score.
Most of the mean score in popular genre group are consistent except “Action, adventure” which is on decline from 1995 to 2007 and “RPG” which was rising rapidly in the period around 1995 to 1998.
Among PlayStation series, after the new version is released (ie. PlayStation 2,3,4), games for the old version (ie. PlayStation 1,2,3) usually perform better!
Total number of games was rising from 1997 to 2008 and falling after 2008. This plot gives the overall view of the gaming industry throughout history.
PC and PlayStation series are the most consistent platform in term of number of games. If gamers want to have many gaming options available, PC and PlayStation are their choice.
Throughout the years, most of the mean score for each genre are quite constant. Except those related to RPG, they are on the rise. While those involved action are on decline.
This dataset is about game rating from ign.com, a famous game website. It involves over 18000 game from 1996 to 2016. It spans most of the gaming platform and game genre available in this period. I start exploring this dataset by plotting frequancy of each variables. By doing this, I got the overall understanding of this dataset. Trend in gaming industry is understood in this period of investigation. Is is peaked in 2008 and has been declining since then. Next I start comparing two different variables bt mean of scatter plot, line plot, stacked bar plot and box plot. Evolution of game score, platform and genre are investigated in this period. Lastly I plot multivariable graph to examine three variables simultaneously. By exploring this dataset, trending in genre can also been seen. I can also see which platform is transcient and which stand over a test of time.